41 research outputs found
Crowdsourcing Without a Crowd: Reliable Online Species Identification Using Bayesian Models to Minimize Crowd Size
We present an incremental Bayesian model that resolves key issues of crowd size and data quality for consensus labeling. We evaluate our method using data collected from a real-world citizen science program, BeeWatch, which invites members of the public in the United Kingdom to classify (label) photographs of bumblebees as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (1) the large number of potential species makes classification difficult, and (2) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around three to five users (i.e., through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BeeWatch can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally, our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label
Capturing mink and data : Interacting with a small and dispersed environmental initiative over the introduction of digital innovation
This case study was carried out by Koen Arts1, Gemma Webster1, Nirwan Sharma1, Yolanda Melero2, Chris Mellish1, Xavier Lambin2 and René van der Wal1. We thank two anonymous reviewers for their suggestions, and Chris Horrill from SMI for his very helpful and insightful comments on previous drafts of this manuscript. The research described here is supported by the award made by the RCUK Digital Economy programme to the dot.rural Digital Economy Hub; award reference: EP/G066051/1.Case study for 'Responsible Research & Innovation in ICT' platformPostprin
ColloCaid: A real-time tool to help academic writers with English collocationsâ
Writing is a cognitively challenging activity that can benefit from lexicographic support. Academic writing
in English presents a particular challenge, given the extent of use of English for this purpose. The ColloCaid
tool, currently under development, responds to this challenge. It is intended to assist academic English writers
by providing collocation suggestions, as well as alerting writers to unconventional collocational choices as
they write. The underlying collocational data are based on a carefully curated set of about 500 collocational
bases (nouns, verbs, and adjectives) characteristic of academic English, and their collocates with illustrative
examples. These data have been derived from state-of-the-art corpora of academic English and academic vocabulary
lists. The manual curation by expert lexicographers and reliance on specifically Academic English
textual resources are what distinguishes ColloCaid from existing collocational resources. A further characteristic
of ColloCaid is its strong emphasis on usability. The tool draws on dictionary-user research, findings in
information visualization, as well as usability testing specific to ColloCaid in order to find an optimal amount
of collocation prompts, and the best way to present them to the user
Developing a writing assistant to help EAP writers with collocations in real time
Corpora have given rise to a wide range of lexicographic resources aimed at helping novice
users of academic English with their writing. This includes academic vocabulary lists, a variety
of textbooks, and even a bespoke academic English dictionary. However, writers may not be
familiar with these resources or may not be sufficiently aware of the lexical shortcomings of
their emerging texts to trigger the need to use such help in the first place. Moreover, writers who
have to stop writing to look up a word can be distracted from getting their ideas down on paper.
The ColloCaid project aims to address this problem by integrating information on collocation
with text editors. In this paper, we share the research underpinning the initial development of
ColloCaid by detailing the rationale of (1) the lexicographic database we are compiling to
support novice EAP usersâ collocation needs and (2) the preliminary visualisation decisions
taken to present information on collocation to EAP users without disrupting their writing. We
conclude the paper by outlining the next steps in the research
Designing online species identification tools for biological recording: the impact on data quality and citizen science learning
In recent years, the number and scale of environmental citizen science programmes that involve lay people in scientific research have increased rapidly. Many of these initiatives are concerned with the recording and identification of species, processes which are increasingly mediated through digital interfaces. Here, we address the growing need to understand the particular role of digital identification tools, both in generating scientific data and in supporting learning by lay people engaged in citizen science activities pertaining to biological recording communities. Starting from two well-known identification tools, namely identification keys and field guides, this study focuses on the decision-making and quality of learning processes underlying species identification tasks, by comparing three digital interfaces designed to identify bumblebee species. The three interfaces varied with respect to whether species were directly compared or filtered by matching on visual features; and whether the order of filters was directed by the interface or a user-driven open choice. A concurrent mixed-methods approach was adopted to compare how these different interfaces affected the ability of participants to make correct and quick species identifications, and to better understand how participants learned through using these interfaces. We found that the accuracy of identification and quality of learning were dependent upon the interface type, the difficulty of the specimen on the image being identified and the interaction between interface type and âimage difficultyâ. Specifically, interfaces based on filtering outperformed those based on direct visual comparison across all metrics, and an open choice of filters led to higher accuracy than the interface that directed the filtering. Our results have direct implications for the design of online identification technologies for biological recording, irrespective of whether the goal is to collect higher quality citizen science data, or to support user learning and engagement in these communities of practice